Forecasting prices of staple foods such as cabbage and potatoes is important because of its direct impact on a country’s economy and society. Nepal, in this case, is a country with a large agricultural sector that contributes to a large part of the country’s GDP. The price of staple foods affects not only farmers’ incomes, but also the economy of the country. These Nepalese staple food prices are also important to forecast as they indicate consumer purchasing power and the general rate of inflation in the country. By predicting these prices one can promote informed decisions, reduce risks and promote economic stability. Forecasting prices for staple foods is and will always be important.
Main goal: accurately forecast the prices of key vegetables in Nepal over a specific time horizon.
Dataset: The data set consists of vegetables in Nepal, where each vegetable represents a time series. The data is of a daily-nature, but it has been transformed into monthly data for the sake of this analysis and forecasting project.
Loading the data sets: fruits and vegetables prices in Nepal, as well as the Consumer Price Index (CPI) for Nepal. The fruits and vegetables table contains the average price for vegetables and fruits on a given day. For the time series analysis this will be aggregated to the month level by taking the average of the average daily prices in a particular month.
The generated tsibble looks like this.
# Loading Datasets.
veggies <- read_csv("data_sources/kalimati_tarkari_dataset.csv",
col_types = cols(Date = col_date(format = "%Y-%m-%d")))
suppressWarnings({
cpi <- read_excel("data_sources/Nepal CPI Trading Economics.xlsx",
col_types = c("date", "numeric"))
})
cpi <- cpi |>
filter(Date >= '2013-08-01 00:00:00')
# tidy up cpi
base_index <- cpi$CPI[1]
cpi <- cpi |>
mutate(Date = yearmonth(cpi$Date),
CPI = CPI/base_index) |>
select(c(Date, CPI))
suppressMessages({
veggies <- veggies %>%
filter(Unit == "Kg") %>%
filter(Commodity %in% c("Tomato Big(Nepali)", "Potato Red",
"Onion Dry (Indian)", "Carrot(Local)",
"Cabbage(Local)")) %>%
mutate(Date = yearmonth(Date)) %>%
select(Commodity, Date, Average) %>%
group_by(Date, Commodity) %>%
summarize(Average = mean(Average)) %>%
ungroup()
})
veggies_ts <- veggies |>
as_tsibble(key = Commodity,
index = Date)
head(veggies_ts, 5)
## # A tsibble: 5 x 3 [1M]
## # Key: Commodity [1]
## Date Commodity Average
## <mth> <chr> <dbl>
## 1 2013 Jun Cabbage(Local) 11.1
## 2 2013 Jul Cabbage(Local) 16.2
## 3 2013 Aug Cabbage(Local) 35.7
## 4 2013 Sep Cabbage(Local) 42.4
## 5 2013 Oct Cabbage(Local) 32.2
This plot shows the average monthly price in Nepalese Rupees of 5 vegetables in Nepal.
veggies_ts |>
autoplot(Average) +
labs(y="Nepalese Rupees",
title="Average price per month of various vegetables in Nepal") +
theme_bw()
The following plot shows the average monthly price of Potato Red in Nepal. The graph shows some spikes around mid year accross all of the years of the observed data, and also shows a high increment in price in 2020.
potato <- veggies_ts |>
filter(Commodity == "Potato Red")
potato |>
autoplot(Average) +
labs(y="Nepalese Rupee",
title="Average Price of Potato Red in Nepal") +
theme_bw()
Moreover, the drastic increase in price in 2020 can be seen in the following plot. 2020 is the year with the highes price of Potato Red in the second semester of the year. In this plot we can also observe how, on average, the second semester of the year sees an increase in price of the potato.
potato |>
gg_season(Average) +
labs(y="Nepalese Rupee",
title="Average Price of Potato Red in Nepal") +
theme_bw()
The price seems to have some seasonal component but also some sort of dependence on the previous months’ prices. The following plot shows the autocorrelation plot of the Potatoes price in Nepal. As seen by the graph the price is highly correlated with the previous month prices and the correlation continues for 3 periods, then decreases.
potato |>
ACF(Average, lag_max = 15) |>
autoplot() +
labs(title="Autocorrelation plot") +
theme_bw()
This plot shows the average monthly price of Carrot(Local) in Nepal.This picture shows that local carrots bottom out around the beginning of each year, indicating that there is relatively strong seasonality. This is possible because the price of agricultural products will change due to different harvest times and different demand. As can be seen from the graph, there is a spike in two years, then a new spike one year later, followed by another spike at the end of two years, indicating a possible cyclicality. The annual price fluctuation range also changes, and the price fluctuations have become greater since 2016. The overall trend shows an initial upward trend and then a slight downward trend.
Carrot <- veggies_ts %>%
filter(Commodity == "Carrot(Local)")
Carrot %>%
autoplot(Average) +
labs(y="Nepalese Rupee",
title="Average Price of Carrot(Local) in Nepal") +
theme_bw()
This plot shows the average price of local carrot in Nepal per month. Looking at the plot we can see that in September we have the highest price on average.Prices will then start to slowly fall in October, with a drop in December and continue into February and March.
Carrot %>%
gg_subseries(Average) +
labs(y="Nepalese Rupee",
title="Average Price of Carrot(Local) in Nepal") +
theme(axis.text.x = element_blank(),
axis.ticks.x = element_blank())
Looking at the initial plot of the Onion there is a strong cyclic behavior, this is not uncommon in agriculture because most veggies is sensitive to weather conditions, the price usually depends on harvesting and so called crop-rotations where farmers swap the veggie they grow for soil fertility and so on. Agriculture is generally speaking pretty sensitive so a lot of outside factors. We can see a high increase/spike at the end of every second year and a dip/decrease at the beginning of every second year. Since this is with a 2 year interval it makes the cyclic behavior very apparent.Between this increase and decrease the price is relatively stable with a possible seasonality. When it comes to the trend it’s not something that is very apparent for the naked eye, but ofc, the prices would naturally increase over time - just like everything else.
Onion <- veggies_ts %>%
filter(Commodity == "Onion Dry (Indian)")
Onion %>%
autoplot(Average) +
labs(y="Nepalese Rupee",
title="Average Price of Onion") +
theme_bw()
This plot shows the average price of cabbage in Nepal per month. Looking
at the plot we can see that the month with the highest price, on
average, is November, then the price decreases rapidly in December up
until march. This confirms the pattern of the previous plot.
Cabbage <- veggies_ts %>%
filter(Commodity == "Cabbage(Local)")
Cabbage %>%
gg_subseries(Average) +
labs(y="Nepalese Rupee",
title="Average Price Cabbage local in Nepal") +
theme(axis.text.x = element_blank(),
axis.ticks.x = element_blank())
It’s important to note that economic series tend to increase overtime, specially prices. Therefore, an adjustment is necessary to compare the prices across time. In this analysis the CPI of Nepal will be used as a deflator, the data was obtained from INSERT REFERENCE and the base for comparison is August 2013.
potato_constant_prices <- potato |>
inner_join(cpi, by = join_by(Date)) |>
mutate(Average_constant_prices = Average / CPI) |>
select(Date, Commodity, Average_constant_prices)
head(potato_constant_prices, 5)
## # A tsibble: 5 x 3 [1M]
## # Key: Commodity [1]
## Date Commodity Average_constant_prices
## <mth> <chr> <dbl>
## 1 2013 Aug Potato Red 28.1
## 2 2013 Sep Potato Red 30.3
## 3 2013 Oct Potato Red 36.6
## 4 2013 Nov Potato Red 38.9
## 5 2013 Dec Potato Red 30.8
In the following can be found the difference between the different transformations. The first plot is just the average price divided by the deflator, the second plot is on top of that transformation a box cox transformation and the last plot is on top of division by deflator a log transformation. The Box Cox transformation on top of the constant prices seems to be the best transformation as it somewhat reduces the spikes.
lambda_guerrero_constant_prices <- potato_constant_prices |>
features(Average_constant_prices, features = guerrero) |>
pull(lambda_guerrero)
# Inflation only
p1 <- potato_constant_prices |>
autoplot(Average_constant_prices) +
labs(y="Constant Prices",
title="Average Price of Potato Red, 2013 prices in Nepal") +
theme_bw()
# Box Cox
p2 <- potato_constant_prices |>
autoplot(box_cox(Average_constant_prices, lambda_guerrero_constant_prices)) +
labs(y="Box Cox",
title="Box Cox Transformation of Potato Red, 2013 prices in Nepal") +
theme_bw()
p3 <- potato_constant_prices |>
autoplot(log(Average_constant_prices)) +
labs(y="Log",
title="Average Log Price of Potato Red, 2013 prices in Nepal") +
theme_bw()
grid.arrange(p1, p2, p3, ncol = 1)
The following plot shows in gray the box cox transformation of the average price of Potatoes in Nepal and the STL decomposition of that series, there is no clear trend line upwards or downwards, it looks more as a cyclical pattern, at around midyear of the even years it reaches it’s highest point in the cycle and around midyear of the odd years it reaches the lowest point.
stl_dcmp <- potato_constant_prices |>
model(stl = STL(box_cox(Average_constant_prices, lambda_guerrero_constant_prices)))
potato_constant_prices %>%
autoplot(box_cox(Average_constant_prices, lambda_guerrero_constant_prices), color='gray') +
autolayer(components(stl_dcmp), trend, color='red') +
ylab("Box Cox Transformation of Average Price") +
ggtitle("STL Decomposition of Average Price of Potato Red in Nepal") +
theme_bw()
The following image shows the X-13 Arima decomposition, the main difference is the trend component and the residuals, the Arima model appears to capture more variability in the trend component and the residuals appears to be only withe noise.
seats_dcmp <- potato_constant_prices |>
model(X_13ARIMA_SEATS(box_cox(Average_constant_prices, lambda_guerrero_constant_prices)))
components(seats_dcmp) |>
autoplot()
Looking at the previous plot of the cabbage there is no super clear indication that it would need any time of transformation. A mathematical transformation (like a log-transformation) is mainly used to reduce multiplicative data, something that we do not necessarily have here. We could also consider a Box-cox transformation. Box cox is used to normalize skewed data and get heterogeneity, we can check how heterogenic the data is to decide whether or not a box-cox is a good choice with the breusch-pagan test.
The breusch-pagan test suggests (somewhat weakly) that we do not have heteroskedasticity because the p-value of 0.2754 is greater than 0.05. Therefore, we fail to reject the null hypothesis at the 0.05 significance level. In other words, we cannot conclude that there is evidence of heteroscedasticity in the data.
So now we have both visually looked at the data and done a breush-pagan test, but we can also inspect the impact of the box-cox transformation itself. Looking at before and after the box-cox transformation we see that is possibly did account for some of the variation, so to perform the box-cox would not be too out of order and we ultimately performed the transformation.
Cabbage_constant_prices <- Cabbage |>
inner_join(cpi, by = join_by(Date)) |>
mutate(Average_constant_prices = Average / CPI) |>
select(Date, Commodity, Average_constant_prices)
head(Cabbage_constant_prices, 5)
## # A tsibble: 5 x 3 [1M]
## # Key: Commodity [1]
## Date Commodity Average_constant_prices
## <mth> <chr> <dbl>
## 1 2013 Aug Cabbage(Local) 35.7
## 2 2013 Sep Cabbage(Local) 42.3
## 3 2013 Oct Cabbage(Local) 32.0
## 4 2013 Nov Cabbage(Local) 41.9
## 5 2013 Dec Cabbage(Local) 28.6
lm_model <- lm(Average_constant_prices ~ Date, data = Cabbage_constant_prices)
bptest(lm_model)
##
## studentized Breusch-Pagan test
##
## data: lm_model
## BP = 1.1899, df = 1, p-value = 0.2754
Cabbage_constant_prices %>%
features(Average_constant_prices, features = guerrero) %>%
pull(lambda_guerrero)-> lambda
In the following can be found the difference between the data with and without the box-cox transformation. As seen it did account for some variance, but ultimately we decided to mot keep the box-coxed transformed Cabbage data.
Cabbage_constant_prices %>% autoplot(Average_constant_prices)
Cabbage_constant_prices %>%
autoplot(box_cox(Average_constant_prices, lambda))
transformed_cabbage <- Cabbage_constant_prices %>%
mutate(Average_constant_prices_transformed = box_cox(Average_constant_prices, lambda))
Trying and evaluating different decomposition models for the cabbage data. The decomposition model who seemingly did the best is the X_13ARIMA_SEATS model. From the following plots it visually seem to capture more variability in the trend component, but looking at the residuals it might not be the optimal forecasting model for the cabbage data, since it still has 4 significant lags where 3 of them are seasonal.
Cabbage_constant_prices %>%
model(STL1 = STL(Average_constant_prices),
STL2 = STL(Average_constant_prices ~ trend(window = 7) + season(window = 5), robust = TRUE),
fixed = STL(Average_constant_prices ~ season(window = "periodic"), robust = TRUE),
x11 = X_13ARIMA_SEATS(Average_constant_prices),
seats = X_13ARIMA_SEATS(Average_constant_prices ~ seats()))%>%
components() -> dcmpCabbage
dcmpCabbage %>% as_tsibble() %>% autoplot(trend)
dcmpCabbage %>% as_tsibble() %>% autoplot(season_adjust)
autoplot(dcmpCabbage)
Cabbage_X13 <- Cabbage_constant_prices %>%
model(RW(Average_constant_prices ~ drift()))
Cabbage_X13 %>% gg_tsresiduals()
For forecasting we will create 4 simple models, the seasonal naive, naive, drift and mean. For the mean model, the future values are the mean of the historical data. In the naive model, the forecast is equal to the last value of the time series. Drift is a random walk, last value plus the average change. Finally, the Seasonal Naive, the forecast equals the last value from the same season. We create a mable with the 4 basic models.
four_basic_models <- potato_constant_prices|>
model(
Seasonal_naive = SNAIVE(Average_constant_prices),
Naive = NAIVE(Average_constant_prices),
Drift = RW(Average_constant_prices ~ drift()),
Mean = MEAN(Average_constant_prices))
four_basic_models
## # A mable: 1 x 5
## # Key: Commodity [1]
## Commodity Seasonal_naive Naive Drift Mean
## <chr> <model> <model> <model> <model>
## 1 Potato Red <SNAIVE> <NAIVE> <RW w/ drift> <MEAN>
In the following plot we can see the performance of the models in a 12 month forecast period. It’s evident from the plot that none of the models perform particularity good. There is a clear seasonality in the data, at the second semester of the year the prices increase, the seasonal naive is able to capture that. However, there is no guarantee that the unfortunate slump in production of 2020 will repeat for 2021 towards 2022. It’s also interesting to note that the drift method that consideres the trend is almost the same as the naive, the trend is negative and close to 0.
four_basic_models |>
forecast(h = "12 months") |>
autoplot(potato_constant_prices, level = NULL) +
ggtitle("12 Months forecast for Average Potato Price, 2013 prices.") +
ylab("Nepalese Rupees") +
guides(colour = guide_legend(title = "Forecast")) +
theme_bw()
The following four plots show the model forecasting vs the actual data. It’s evident that the worst model, the one with the largest residuals, is the mean. It does not follow any pattern in the data it just takes the sample mean. The naive and drift models follows the noise of the data closely but with a lag of one period. While the seasonal naive is able to somewhat capture the seasonality of the data but has large errors in 2015, 2018 and 2020 due to changes in the seasonality from one year to the other.
models <- c("Seasonal_naive", "Naive", "Drift", "Mean")
plots <- map(models, ~ four_basic_models |>
select(Commodity, all_of(.x)) |>
augment() |>
ggplot(aes(x = Date)) +
geom_line(aes(y = Average_constant_prices, colour = "Data")) +
geom_line(aes(y = .fitted, colour = "Fitted")) +
ggtitle(sprintf("Data and %s model fitted", .x)) +
ylab("Nepalese Rupees"))
# Suppress warning messages while printing each plot
walk(plots, ~suppressWarnings(print(.)))
The following four plots shows the residuals of the four basic models. As seen by the plots not a single model reduces the error to white noise. All of the models still exhibit autocorrelation in the errors as seen by the ACF plots, the errors appear to be non-normally distributed, except the seasonal naive model. The conclusion is that these models are not appropriate to model or forecast the time series.
# This chunk could be replaced with a loop if R had actual proper loops instead of weird ones that doesn't weird behavior.
four_basic_models |>
select(Commodity, Seasonal_naive) |>
gg_tsresiduals() +
ggtitle("residuals of Seasonal_naive")
four_basic_models %>%
select(Commodity, Naive) %>%
gg_tsresiduals()+
ggtitle("residuals of Naive")
four_basic_models %>%
select(Commodity, Drift) %>%
gg_tsresiduals()+
ggtitle("residuals of Drift")
four_basic_models %>%
select(Commodity, Mean) %>%
gg_tsresiduals()+
ggtitle("residuals of Mean")
For testing the accuracy of the 4 simple models and compare them, a recursive window method is used, the recursive window will be 18 months, increasing by 1 month. The first table shows the accuracy of the model when forecasting a 12 period window, 12 months and the second table when forecasting only 1 month. The results shows the same conclusion that we reached above by looking at the fitted model vs the data. The drift and naive models are quite similar in results and they are good at forecasting small windows since this two models follows the noise of the data. However, when forecasting longer periods such as 12 months they do not perform as good. On the other hand, the seasonal naive model performs worse than the naive and drift for forecasting small periods of time but perform much better when forecasting longer periods such as 12 months. However, from the residuals we concluded that neither of these methods produces residuals that resembles white noise.
fit_cv <- potato_constant_prices |>
stretch_tsibble(.init = 18, .step = 1) |>
model(
Seasonal_naive = SNAIVE(Average_constant_prices),
Naive = NAIVE(Average_constant_prices),
Drift = RW(Average_constant_prices ~ drift()),
Mean = MEAN(Average_constant_prices))
fc_cv <- fit_cv %>%
forecast(h=12)
suppressMessages({fc_cv %>%
accuracy(potato_constant_prices) %>%
select(.model, .type, RMSE, MAE, MAPE, MASE, RMSSE)})
## # A tibble: 4 × 7
## .model .type RMSE MAE MAPE MASE RMSSE
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Drift Test 13.3 10.2 38.4 1.43 1.47
## 2 Mean Test 10.4 7.64 28.6 1.07 1.15
## 3 Naive Test 12.5 9.40 35.4 1.32 1.39
## 4 Seasonal_naive Test 8.71 6.80 24.4 0.955 0.965
fc_cv <- fit_cv %>%
forecast(h=1)
suppressMessages({fc_cv %>%
accuracy(potato_constant_prices) %>%
select(.model, .type, RMSE, MAE, MAPE, MASE, RMSSE)})
## # A tibble: 4 × 7
## .model .type RMSE MAE MAPE MASE RMSSE
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Drift Test 5.58 4.24 15.3 0.596 0.618
## 2 Mean Test 10.3 7.73 30.4 1.09 1.14
## 3 Naive Test 5.54 4.21 15.2 0.590 0.613
## 4 Seasonal_naive Test 9.01 7.08 26.5 0.995 0.998
In the following plot we create a forecast for the average price of Local Carrot of 12 month using the models we previously introduced. Different from the naive model, the drift model predicts that the prices would slightly decrease. The seasonal naive model suggests that the price will replicate the previous trend which decreases at the beginning of the year and then increases. But we cannot assure that the price trend will be the same as the previous cycle or whether it will has a downward trend.
Carrot_constant_prices <- Carrot %>%
inner_join(cpi, by = join_by(Date)) %>%
mutate(Average_constant_prices = Average / CPI) %>%
select(Date, Commodity, Average_constant_prices)
models <- Carrot_constant_prices %>%
model(
Seasonal_naive = SNAIVE(Average_constant_prices),
Naive = NAIVE(Average_constant_prices),
Drift = RW(Average_constant_prices ~ drift()),
Mean = MEAN(Average_constant_prices))
models %>%
forecast(h = 12) %>%
autoplot(Carrot_constant_prices, level = NULL) +
ylab("Nepalese Rupees") +
ggtitle("12 Months forecast of Average Carrot(Local) Price (2013 prices)") +
guides(colour = guide_legend(title = "Forecast models")) +
theme_bw()
Looking at models that might suit the Onion data for forecasting we are evaluating the naive, seasonal naive, drift and mean model. we have a total of 94 rows, so we can use 26 for training and leave the rest.The best model here (which has the lowest RMSE) is the drift model, but it not necessarily that much better than the Arima_x13. The residuals are still not representing white noise.
Onion_constant_prices <- Onion |>
inner_join(cpi, by = join_by(Date)) |>
mutate(Average_constant_prices = Average / CPI) |>
select(Date, Commodity, Average_constant_prices)
num_rows <- nrow(Onion_constant_prices)
print(num_rows)
## [1] 94
train_onion <- Onion_constant_prices %>% slice(1:(n() - 60))
fit_onion <- train_onion %>%
model(
Naive = NAIVE(Average_constant_prices),
Seasonal_Naive = SNAIVE(Average_constant_prices),
Drift = RW(Average_constant_prices ~ drift()),
Mean = MEAN(Average_constant_prices)
)
fit_onion %>% accuracy() %>% arrange(RMSE)
## # A tibble: 4 × 11
## Commodity .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Onion Dry (… Drift Trai… 2.20e-15 12.6 8.03 -2.35 17.4 0.394 0.449 0.430
## 2 Onion Dry (… Naive Trai… -1.66e+ 0 12.8 7.95 -7.00 17.5 0.390 0.453 0.430
## 3 Onion Dry (… Mean Trai… 8.36e-16 22.0 17.2 -21.2 41.7 0.842 0.782 0.791
## 4 Onion Dry (… Seaso… Trai… -4.56e+ 0 28.2 20.4 -26.4 51.7 1 1 0.795
When specifically looking at best model for forecasting (for a 24 month period due to our cyclic behavior) to see how the different models compare, it is clear here that no model is too good and that even the best model is bad, as evidenced by its poor residual that does not resemble white noise.
fit_onion %>% forecast(h = 24) %>% accuracy(Onion_constant_prices)%>% arrange(RMSE)
## # A tibble: 4 × 11
## .model Commodity .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Naive Onion Dr… Test 8.77 16.7 9.29 18.4 21.4 0.456 0.593 0.832
## 2 Mean Onion Dr… Test -15.9 21.3 20.0 -86.4 93.1 0.982 0.757 0.832
## 3 Seasonal_Nai… Onion Dr… Test -15.0 30.3 22.5 -77.5 91.9 1.11 1.08 0.791
## 4 Drift Onion Dr… Test 29.5 37.0 29.5 98.0 98.0 1.45 1.32 0.899
onion_drift <- Onion_constant_prices %>% model(NAIVE(Average_constant_prices))
onion_drift %>% gg_tsresiduals()
From the previous plots and models it’s clear that the data exhibits high seasonality but does not have a clear trend, upwards or down sustained in time, only seasonal variability. Therefore, the best model would be one that includes seasonality and possibly no trend. From what we’ve observed from plots the prices are heavily dependent on the season and the previous prices, therefore it’s expected that alpha will be close to 1 to weight more the most recent observations.
fit_my_ets <- potato_constant_prices |>
model(my_ets = ETS(Average_constant_prices ~ error("A") + trend("N") + season("A")))
As expected the alpha is close to 1 meaning that the most recent observations are weighted heavily.
fit_my_ets |>
tidy()
## # A tibble: 15 × 4
## Commodity .model term estimate
## <chr> <chr> <chr> <dbl>
## 1 Potato Red my_ets alpha 0.997
## 2 Potato Red my_ets gamma 0.000129
## 3 Potato Red my_ets l[0] 28.1
## 4 Potato Red my_ets s[0] 2.31
## 5 Potato Red my_ets s[-1] -2.56
## 6 Potato Red my_ets s[-2] -6.07
## 7 Potato Red my_ets s[-3] -6.91
## 8 Potato Red my_ets s[-4] -9.70
## 9 Potato Red my_ets s[-5] -9.40
## 10 Potato Red my_ets s[-6] -3.81
## 11 Potato Red my_ets s[-7] 2.36
## 12 Potato Red my_ets s[-8] 8.48
## 13 Potato Red my_ets s[-9] 12.9
## 14 Potato Red my_ets s[-10] 7.96
## 15 Potato Red my_ets s[-11] 4.46
The model appears to leave white noise residuals, and exhibits a clear and constant seasonality, prices increasing towards the end of the year.
fit_my_ets |>
components() |>
autoplot()
The model appears to adjust better than the other models but still exhibits some autocorrelation in the errors, but they appear to be normally distributed with mean 0.
fit_my_ets |>
gg_tsresiduals()
fit_ets <- potato_constant_prices |>
model(ets = ETS())
## Model not specified, defaulting to automatic modelling of the
## `Average_constant_prices` variable. Override this using the model formula.
The automatic model estimated with fpp3 seems to be almost the same model as we tought, without a trend, but with some adjustments in the parameteres.
fit_ets |>
tidy()
## # A tibble: 15 × 4
## Commodity .model term estimate
## <chr> <chr> <chr> <dbl>
## 1 Potato Red ets alpha 1.00
## 2 Potato Red ets gamma 0.000101
## 3 Potato Red ets l[0] 25.1
## 4 Potato Red ets s[0] 1.04
## 5 Potato Red ets s[-1] 0.885
## 6 Potato Red ets s[-2] 0.777
## 7 Potato Red ets s[-3] 0.741
## 8 Potato Red ets s[-4] 0.663
## 9 Potato Red ets s[-5] 0.677
## 10 Potato Red ets s[-6] 0.867
## 11 Potato Red ets s[-7] 1.11
## 12 Potato Red ets s[-8] 1.34
## 13 Potato Red ets s[-9] 1.44
## 14 Potato Red ets s[-10] 1.30
## 15 Potato Red ets s[-11] 1.16
The seasonality appears to be the same as our model being constant all over the observed periods, increasing the prices towards the end of the year and the residuals appears to be white noise.
fit_ets |>
components() |>
autoplot()
The p-value of the Ljung-Box test of the potato series is 0.29, therefore we fail to reject the null hypothesis, suggesting that there is no significant evidence of autocorrelation in the residuals.
augment(fit_ets) %>%
features(.resid, ljung_box, lag=24, dof=0)
## # A tibble: 1 × 4
## Commodity .model lb_stat lb_pvalue
## <chr> <chr> <dbl> <dbl>
## 1 Potato Red ets 27.3 0.289
The tables show the performance of the exponential smoothing model compared to the 4 most basic models, the first table is forecasting a 12 month period and the second table forecasting only 1 month. As seen by the tables the ETS model performs much better in both forecasting than the simple methods, in almost every single metric it outperforms the simpler methods, but also the residuals of the ETS model appears to be white noise.
fit_cv <- potato_constant_prices |>
stretch_tsibble(.init = 18, .step = 1) |>
model(
Seasonal_naive = SNAIVE(Average_constant_prices),
Naive = NAIVE(Average_constant_prices),
Drift = RW(Average_constant_prices ~ drift()),
Mean = MEAN(Average_constant_prices),
ETS = ETS())
## Model not specified, defaulting to automatic modelling of the
## `Average_constant_prices` variable. Override this using the model formula.
fc_cv <- fit_cv %>%
forecast(h=12)
suppressMessages({fc_cv %>%
accuracy(potato_constant_prices) %>%
select(.model, .type, RMSE, MAE, MAPE, MASE, RMSSE)})
## # A tibble: 5 × 7
## .model .type RMSE MAE MAPE MASE RMSSE
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Drift Test 13.3 10.2 38.4 1.43 1.47
## 2 ETS Test 8.31 6.47 24.7 0.909 0.920
## 3 Mean Test 10.4 7.64 28.6 1.07 1.15
## 4 Naive Test 12.5 9.40 35.4 1.32 1.39
## 5 Seasonal_naive Test 8.71 6.80 24.4 0.955 0.965
fc_cv <- fit_cv %>%
forecast(h=1)
suppressMessages({fc_cv %>%
accuracy(potato_constant_prices) %>%
select(.model, .type, RMSE, MAE, MAPE, MASE, RMSSE)})
## # A tibble: 5 × 7
## .model .type RMSE MAE MAPE MASE RMSSE
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Drift Test 5.58 4.24 15.3 0.596 0.618
## 2 ETS Test 4.22 3.23 11.6 0.453 0.468
## 3 Mean Test 10.3 7.73 30.4 1.09 1.14
## 4 Naive Test 5.54 4.21 15.2 0.590 0.613
## 5 Seasonal_naive Test 9.01 7.08 26.5 0.995 0.998
We are now going to choose a ETS model for the Cabbage, We have already looked at the ACF and seen that the data is stationary, but with somewhat seasonal components, with this in mind we choose the ANA model, because we do not have trend, but we do have seasonality that does increase with the level of the time series.
But we can also try Holts-Winter and SES just to have something to compare with. Out of these 3 models, the one we picked out (the MNM) model seems to be the best. Out of these models it is concluded that the MNM-model is the best. It is also the model that the program choose.
The ANA-model seems fine, the residuals resembles white noise and are somewhat normally distributed.
Cabbage_ExponentialSmoothing <- Cabbage_constant_prices %>%
model(MNM = ETS(Average_constant_prices ~ error("M") + trend("N") + season("M")),
holt = ETS(Average_constant_prices ~ error("A") + trend("A") + season("N")),
ses = ETS(Average_constant_prices ~ error("A") + trend("N") + season("N")),
ETS = ETS(Average_constant_prices))
report(Cabbage_ExponentialSmoothing)
## # A tibble: 4 × 10
## Commodity .model sigma2 log_lik AIC AICc BIC MSE AMSE MAE
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Cabbage(Local) MNM 0.0883 -372. 773. 779. 811. 37.9 88.9 0.222
## 2 Cabbage(Local) holt 57.9 -402. 814. 815. 827. 55.4 119. 5.45
## 3 Cabbage(Local) ses 56.3 -402. 810. 810. 818. 55.1 119. 5.40
## 4 Cabbage(Local) ETS 0.0883 -372. 773. 779. 811. 37.9 88.9 0.222
CabbageMNM <- Cabbage_constant_prices %>% model(ses = ETS(Average_constant_prices ~ error("M") + trend("N") + season("M")))
CabbageMNM %>% gg_tsresiduals()
CabbageMNM %>%
tidy()
## # A tibble: 15 × 4
## Commodity .model term estimate
## <chr> <chr> <chr> <dbl>
## 1 Cabbage(Local) ses alpha 0.974
## 2 Cabbage(Local) ses gamma 0.000102
## 3 Cabbage(Local) ses l[0] 18.7
## 4 Cabbage(Local) ses s[0] 1.19
## 5 Cabbage(Local) ses s[-1] 0.846
## 6 Cabbage(Local) ses s[-2] 0.584
## 7 Cabbage(Local) ses s[-3] 0.599
## 8 Cabbage(Local) ses s[-4] 0.502
## 9 Cabbage(Local) ses s[-5] 0.583
## 10 Cabbage(Local) ses s[-6] 0.747
## 11 Cabbage(Local) ses s[-7] 0.931
## 12 Cabbage(Local) ses s[-8] 1.59
## 13 Cabbage(Local) ses s[-9] 1.56
## 14 Cabbage(Local) ses s[-10] 1.56
## 15 Cabbage(Local) ses s[-11] 1.30
Cabbage_ExponentialSmoothing %>% glance()
## # A tibble: 4 × 10
## Commodity .model sigma2 log_lik AIC AICc BIC MSE AMSE MAE
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Cabbage(Local) MNM 0.0883 -372. 773. 779. 811. 37.9 88.9 0.222
## 2 Cabbage(Local) holt 57.9 -402. 814. 815. 827. 55.4 119. 5.45
## 3 Cabbage(Local) ses 56.3 -402. 810. 810. 818. 55.1 119. 5.40
## 4 Cabbage(Local) ETS 0.0883 -372. 773. 779. 811. 37.9 88.9 0.222
Just looking at the forecasts the RMSE, and residuals, we can tell that the the MNM model is better than the best simple model method, which is the RW (random walk with drift).
cabbage_forecast2 <- CabbageMNM %>%
forecast(h = 12)
accuracy(cabbage_forecast2, Cabbage_constant_prices) %>%
summarise(.model = .model[which.min(RMSE)],RMSE = min(RMSE))
## # A tibble: 0 × 2
## # ℹ 2 variables: .model <chr>, RMSE <dbl>
accuracy(cabbage_forecast2, Cabbage_constant_prices) %>%
summarise(.model = .model[which.min(MAE)], MAE = min(MAE))
## # A tibble: 0 × 2
## # ℹ 2 variables: .model <chr>, MAE <dbl>
accuracy(cabbage_forecast2, Cabbage_constant_prices) %>%
summarise(.model = .model[which.min(MAPE)], MAPE = min(MAPE))
## # A tibble: 0 × 2
## # ℹ 2 variables: .model <chr>, MAPE <dbl>
accuracy(cabbage_forecast2, Cabbage_constant_prices) %>%
summarise(.model = .model[which.min(RMSSE)], RMSSE = min(RMSSE))
## # A tibble: 0 × 2
## # ℹ 2 variables: .model <chr>, RMSSE <dbl>
The best auto ARIMA model is ARIMA(2,0,0)(1,1,0). Let’s break down what this implies: 1. Auto regressive (AR) terms (p): 2 auto regressive terms, this is expected since the exponential smoothing model revealed that the prices are highly dependent on the previous values (weighted more). 2. Difference (d): 0, this implies that the series appears to be stationary and that no differencing is needed to make the series stationary, this is unusual of economic series, where most of them are Integrated of order 1. 3. Moving Average (MA) terms (q): is zero, meaning that there is no need to include past errors to forecast the series. 4. Seasonal Autoregressive (SAR) terms (P): the P is 1, meaning that the current observation is modeled as a linear combination of the previous observation from the same season, we concluded by plotting the series that there is a high seasonal component, specially towards the end of the year where the prices tend to increase. 5. Seasonal Differencing (D): this is almost equivalent to the number 2 to difference the series, in this case the season is differenced to make it stationary because of the strong seasonal component of the series. 6. Seasonal Moving Average (SMA) terms (Q):also equivalent to the corresponding moving average term, in this series is zero. meaning that there is no need to include past errors to forecast the series. 7. Seasonal Period ([s]): this is the final component which is 12, meaning that the granularity of the data is monthly and the seasonal component is repeated every year.
fit_arima <- potato_constant_prices |>
model(ets = ARIMA()) |>
report()
## Model not specified, defaulting to automatic modelling of the
## `Average_constant_prices` variable. Override this using the model formula.
## Series: Average_constant_prices
## Model: ARIMA(2,0,0)(1,1,0)[12]
##
## Coefficients:
## ar1 ar2 sar1
## 1.0756 -0.2719 -0.4160
## s.e. 0.1131 0.1113 0.1198
##
## sigma^2 estimated as 15.56: log likelihood=-229.19
## AIC=466.38 AICc=466.9 BIC=476.01
The following table compares the ARIMA model with the previous models: exponential smoothing, seasonal naive, naive, drift and mean. The 4 most simple models does not provide the best estimate and moreover the residuals are not white noise. Between ARIMA and ETS the decision becomes narrower, both perform good and the residuals resemble white noise. However, the ETS models seems to perform better on both 1 month forecast and 12 months forecast in almost every metric. Therefore, we conclude that the best model to forecast the Potato Prices is the ETS.
fit_cv <- potato_constant_prices |>
stretch_tsibble(.init = 18, .step = 1) |>
model(
Seasonal_naive = SNAIVE(Average_constant_prices),
Naive = NAIVE(Average_constant_prices),
Drift = RW(Average_constant_prices ~ drift()),
Mean = MEAN(Average_constant_prices),
ETS = ETS(),
Arima = ARIMA())
## Model not specified, defaulting to automatic modelling of the `Average_constant_prices` variable. Override this using the model formula.
## Model not specified, defaulting to automatic modelling of the `Average_constant_prices` variable. Override this using the model formula.
fc_cv <- fit_cv %>%
forecast(h=12)
suppressMessages({fc_cv %>%
accuracy(potato_constant_prices) %>%
select(.model, .type, RMSE, MAE, MAPE, MASE, RMSSE)})
## # A tibble: 6 × 7
## .model .type RMSE MAE MAPE MASE RMSSE
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Arima Test 8.90 6.28 22.6 0.882 0.986
## 2 Drift Test 13.3 10.2 38.4 1.43 1.47
## 3 ETS Test 8.31 6.47 24.7 0.909 0.920
## 4 Mean Test 10.4 7.64 28.6 1.07 1.15
## 5 Naive Test 12.5 9.40 35.4 1.32 1.39
## 6 Seasonal_naive Test 8.71 6.80 24.4 0.955 0.965
fc_cv <- fit_cv %>%
forecast(h=1)
suppressMessages({fc_cv %>%
accuracy(potato_constant_prices) %>%
select(.model, .type, RMSE, MAE, MAPE, MASE, RMSSE)})
## # A tibble: 6 × 7
## .model .type RMSE MAE MAPE MASE RMSSE
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Arima Test 4.58 3.47 12.7 0.487 0.507
## 2 Drift Test 5.58 4.24 15.3 0.596 0.618
## 3 ETS Test 4.22 3.23 11.6 0.453 0.468
## 4 Mean Test 10.3 7.73 30.4 1.09 1.14
## 5 Naive Test 5.54 4.21 15.2 0.590 0.613
## 6 Seasonal_naive Test 9.01 7.08 26.5 0.995 0.998
Regarding the Cabbage our data is stationary and needed 0 differencing.Looking at the ACF that we have 2 seasonal lags and 7 non seasonal lags. In the PACF we have 4 lags, no seasonal lags. So we can take the easy route and take an ARIMA(4,0,0). when we compare our choice of model and the model chosen by the auto, we see that we could not pick up.
Auto chose the model ARIMA(4,0,0)(2,0,0)[12] which represents a seasonal ARIMA model with seasonal terms: P=2: There are two seasonal autoregressive terms at lag 12. D=0: No seasonal differencing is applied. Q=0: There are no seasonal moving average terms.
Cabbage_constant_prices %>% gg_tsdisplay((Average_constant_prices), plot_type = "partial")
cabbage_fit <- Cabbage_constant_prices %>%
model(arima400 = ARIMA(Average_constant_prices ~ pdq(4,0,0)),
auto = ARIMA(Average_constant_prices, stepwise = FALSE, approx = FALSE))
glance(cabbage_fit) %>%
arrange(AICc) %>%
select(.model:BIC)
## # A tibble: 2 × 6
## .model sigma2 log_lik AIC AICc BIC
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 auto 40.6 -305. 627. 628. 647.
## 2 arima400 42.8 -308. 628. 629. 643.
For other forecasting methods this time series will be forecasted with Prophet, developed by Taylor and Letham (2017) at Meta Platforms, Inc. Formerly named Facebook, Inc. The code is implemented following the Quickstart R API retrived from https://facebook.github.io/prophet/docs/quick_start.html#r-api.
Prophet requires two columns, “ds” to be a date column and “y” to be a numeric column to forecast.
df <- potato_constant_prices |> as_tibble() |>
mutate(ds = Date,
y = Average_constant_prices) |>
select(ds, y)
The prophet function allows to calculate a forecast of the series.
m <- prophet(df, seasonality.mode = 'multiplicative')
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
The following table shows the last values of the forecast made with prophet, as seen even if the data is monthly prophet will make a daily forecast.
future <- make_future_dataframe(m, periods = 365)
forecast <- predict(m, future)
tail(forecast[c('ds', 'yhat', 'yhat_lower', 'yhat_upper')])
## ds yhat yhat_lower yhat_upper
## 454 2022-04-26 33.26628 27.27618 39.56348
## 455 2022-04-27 31.47488 24.98795 37.55620
## 456 2022-04-28 30.28656 24.09888 36.50826
## 457 2022-04-29 29.76002 23.10438 35.82801
## 458 2022-04-30 29.93481 23.78214 36.00040
## 459 2022-05-01 30.83011 24.60785 37.00140
The following plot shows the actual vs fitted values and a forecast of 1 year. The model adjusts good to the actual values. However, the forecast doesn’t look good, it does not follow well the seasonal pattern of the data, with higher prices at the end of the year, it just follows a noise and does not seem plausible.
plot(m, forecast)
Finally, the RMSE is calculated using the same rolling window used for the fable models and 1 month forecast. since prophet does not provide an easier way to calculate this compared to fpp3 the code is a little verbose. The RMSE is 7.16 which is higher than ETS and ARIMA for 1 month forecast, as seen this method is worse than ETS, ARIMA, Drift and Naive as measured by RMSE, this was expected by looking at the plots, the model fails to capture seasonality and does not provide a good forecast compared to ETS and ARIMA.
init_obs <- 18
# Define the number of folds
n_folds <- nrow(df) - init_obs + 1
prophet_rolling_cv <- function(df, init_obs, n_folds) {
cv_results <- numeric(n_folds)
for (i in 1:n_folds) {
train <- df[1:(init_obs + i - 1), ]
test <- df[(init_obs + i):(init_obs + i), ]
# Fit Prophet model on training data
suppressMessages({ m <- prophet(train) })
suppressMessages({ future <- make_future_dataframe(m, periods = nrow(test), freq = "month") })
forecast <- predict(m, future)
# Calculate RMSE
suppressMessages({ cv_results[i] <- sqrt(mean((forecast$yhat[-(1:nrow(train))] - test$y)^2)) })
}
suppressMessages({ return(mean(cv_results, na.rm = TRUE)) })
}
suppressMessages({ cv_rmse <- prophet_rolling_cv(df, init_obs, n_folds) })
print(paste0("Prophet RMSE:",cv_rmse))
## [1] "Prophet RMSE:7.16472563408317"
To Forecast Carrot with another method method, we use prophet forecasting procedure which is fast and provides completely automated forecasts that can be tuned by hand by data analysts.
First, we would import prophet:
Then we need a dataframe with columns ds and y, containing the date and numeric value respectively.
df <- Carrot_constant_prices %>%
as_tibble() %>%
mutate(ds = Date, y = Average_constant_prices) %>%
select(ds, y)
Then we call the prophet function to calculate forecast.
m <- prophet(df, seasonality.mode = 'multiplicative')
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
Predictions are made on a dataframe with a column ds containing the dates for which predictions are to be made. The make_future_dataframe function takes the model object and a number of periods to forecast and produces a suitable dataframe.
future <- make_future_dataframe(m, periods = 365)
tail(future)
## ds
## 454 2022-04-26
## 455 2022-04-27
## 456 2022-04-28
## 457 2022-04-29
## 458 2022-04-30
## 459 2022-05-01
Then we use the generic predict function to get our forecast. The forecast object is a dataframe with a column yhat containing the forecast. It has additional columns for uncertainty intervals and seasonal components.
forecast <- predict(m, future)
tail(forecast[c('ds', 'yhat', 'yhat_lower', 'yhat_upper')])
## ds yhat yhat_lower yhat_upper
## 454 2022-04-26 18.11052 2.883870 33.22649
## 455 2022-04-27 20.12439 4.710410 35.48565
## 456 2022-04-28 22.57022 6.574426 38.63204
## 457 2022-04-29 25.38847 10.134016 39.96117
## 458 2022-04-30 28.50965 12.727900 44.24279
## 459 2022-05-01 31.85643 16.236226 47.87040